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ABSTRACT 

This is the second paper of two reporting results from a study of the H I 
content and stellar properties of nearby galaxies detected by the Arecibo Legacy 
Fast ALFA blind 21-cm line survey and the Sloan Digital Sky Survey in a 2160 
square degree region of high galactic latitude sky covered by both surveys, in 
the general Virgo direction. Here we analyze a complete HI flux-limited subset 
of 1624 objects with homogeneously measured 21-cm and multi-wavelength opti- 
cal attributes extracted from the control sa mple of H I em i tters in environments 



of low local galactic density assembled by iToribio et al.l (120101 ). Strategies of 
multivariate data analysis are applied to this dataset in order to: i) investigate 
the correlation structure of the space defined by an extensive set of potentially 
independent observables describing gas-rich systems; ii) identify the intrinsic pa- 
rameters that best define their neutral gas content; and iii) explore the scaling 
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relations arising from the joint distributions of the quantities most strongly cor- 
related with the H I mass. The principal component analysis performed over a set 
of five galaxy properties reveals that they are strongly interrelated, supporting 
previous claims that nearby H I emitters show a high degree of correlation. The 
best predictors for the expected value of Mm are the diameter of the stellar disk, 
D25,r, followed by the total luminosity (both in the r-band), and the maximum 
rotation speed, while morphological proxies such as color show only a moder- 
ately strong correlation with the gaseous content attenuated by observational 
error. Among the various inferred prescriptions, the simples and most accurate 
is log(MHi/MQ) = 8.72 + 1.25 log(D25,,./kpc). We find a slope of -8.2 ± 0.5 for 
the relation between optical magnitude and log rotation speed, in good agree- 
ment with Tully-Fisher studies, as well as a log slope of 1.55 ± 0.06 for the HI 
mass-optical galaxy size relation. Given the homogeneity of the measurements 
and the completeness of our dataset, the latter outcome suggests that the con- 
stancy of the average (hybrid) H I surface density advocated by some authors for 
the spiral population is just a crude approximation. 

Subject headings: PACS: 02.50.Sk, 98.52.Nr, 98.62.Ai, 98.62.Lv, 98.62.Qz, 
98.62.Ve 



INTRODUCTION 



While the literature abounds with attempts of improving our knowledge about the for- 
mation and evolution of g alaxies from the cross-correlation of the main properties of their 



baryonic components (e.g.,lGavazzi. Pierini. fc Bosellilll996uRosenberg. Schneider, fc Posson-Brown 



20051 : iGarcia-Appadoo et al.ll2009l . to name a few representative examples), the possibility 
of using this sort of relationships to set reference standards for the H I content has received 
comparatively less attention. Early comparisons of the neutral hydrogen abundance between 
Virgo cluster and field galaxies put the emp hasis on using distan c e-independent measures , 
such as the Mn^/L and Mn^/D^ ratios (e.g.. iDavies fc Lewislll973l : IChamaraux et al.lll980l ). 
L and D beii ig, respectively, the o p tical l uminosity and in trinsic linear diameter at a certain 
wavelength. iHaynes fc Giovanellil (jl984t hereafter |HG84( ) were the first both to carry out 
an objective evaluation of the performance of different diagnostic tools for the H I content 
and to provide a rigorous operational definition of this quantity. With the help of a con- 
trol sample of 288 galaxies with 21- cm line emission belonging to the Catalogue of Isolated 
Galaxies (GIG. lKarachentsevalll973l ) . these authors demonstrated that, whatever the Hubble 
type, the optical linear diameter is the most important diagnostic tool for the HI mass of 
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galaxies. New expressions for the stan dards of HI content according t o lHG84l 's definition 
were later derived in an unbiased way by ISolanes. Giovanelli. fc Haynes f ll996h from a larger, 
integrated H I fiux-li mited sample of 532 gala xies from the Catalog of Galaxies and Clusters 
of Galaxies (CGCG, IZwicky et al.lll961-1968[ ) located in the lowest density environments of 
the Pisces-Perseus supercluster region. 

One serious limitation of these studies, carried out in a time when wide-field redshift 
surveys were still in their infancy, is that they had to deal with heterogeneous datasets of 
optically selected targets assembled from incomplete catalogs and affected by complex sam- 
pling biases_tlia^undem the validity of the results. In this paper and the accompanying 
one flToribio et al.ll2010t hereafter Paper I), we conduct a systematic analysis of the main 
structural properties of galaxies selected according to their H I-line emission, which in terms 
of both sampling quality and statistics represents a significant improvement with respect to 
earlier studies of this kind. Our study grows out from the combination of data from two large 
surveys that homogeneously map the distribution of extragalactic sources over a significant 
fraction of the local universe. These are a compilation of all the data fr om the ongoing 



Arecibo Legacy Fast ALFA Survey (ALFALFA) blind 21-cm line survey (IGiovanelli et al. 



20051 ) gathered so far in the northern Galactic hemisphere, which contains H I measurements 
distributed in two separate regions of the high Galactic latitude sky that cover a 2;-space 
volume of about 2160 deg^ x 18, 000 km s~^, and the Sloan Digital Sky Survey Data Release 
Seventh (SDSS DR7; lAbazajian et al.ll2009l ). which is com plemented with add itional data 
from the NYU Value-Added Galaxy Catalog (NYU-VAGC;1 ' 



Blanton et al.ll2005f ). 



In Paper I, we deal with the assembly of control samples of ALFALFA galaxies that are 
expected to show little or no evidence of interaction with their surroundings and, therefore, 
that are suitable for providing absolute measures of the HI mass. According to the results 
of this study, the optimal dataset to set up standards for the neutral gas content of galaxies 
is a sample of 5647 H I emitters found in regions of low local galactic density, as defined by 
a nearest neighbor approach. A complete 21-cm fiux-limited subset of this control sample 
will be used here (Section |2]) with the aim of exploring inter- variable linear correlation^ and 
determining the combinations of intrinsic properties that best define the H I mass of gas-rich 
objects. In the remainder of the present manuscript, we will show first that the galactic 
stellar size, luminosity, rotation speed, and to a lesser extent the color, are the intrinsic 
factors most closely related to the HI mass (Section [3]), Then, we will apply strategies of 
non-parametric multivariate data analysis to these variables in order to: determine their 
correlation structure and the number and orientation of the statistically significant principal 



^The correlation analysis carried out in this work requires that we ignore any possible curvature in the 
relationships investigated. 
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components (Section I4.ip : establish standards of normalcy for the HI content of galaxies 
(Section I4.2p : and examine the constraints that can be inferred from these relationships on 
the most firmly established empirical scaling laws for disk galaxies (Section l4.3p . Appendix IA| 
briefly discusses two statistical tests implemented in order to assess the completeness limits 
of the ALFALFA data as a function of integrated H I flux. 

The luminosity distances needed to calculate the various distance-dependent structural 
properties used in this investigation have been inferred within the framework of the standard 
concordant flat ACDM cosmology with a reduced Hubble constant h = iiro/(100 km s~^ Mpc~^) = 
0.7. 



2. SAMPLE SELECTION 

We use a trimmed version of the Low-Density Environment (LDE) HI galaxy sample 
assembled in Paper I. The original LDE sample consists of 5647 reliable ALFALFA detections 
with a signal-to-noise ratio S/N > 4.5 that have an optical counterpart in the SDSS catalog 
and that inhabit regions of low local galactic density (pe < 0.5 galaxies Mpc~^; see Paper 
I for details). In order to deal with data of the highest completeness and quality, the present 
analysis has been restricted, however, to those HI sources designated code 1, i.e., with a 
S/N > 6.5, a clean spectral profile, and a good match between the two polarizations inde- 
pendently observed by ALFALFA, for which it can be assumed that the completeness limit is 
well represented by the det ection limi t of th e survey. Tests of the performance of the signal 



detection pipeline made by ISaintongd (120071 ) estimate a reliability close 100% for ALFALFA 
objects above the prescribed S/N threshold and an overall completeness approaching ~ 90% 
for those with a narrow observed velocity width (< 150 km s~^). In contrast, for the few code 
2 sources originally included in the LDE dataset, which have 4 < S/N < 6.5, the reliability 
for detections of narrow signal is reduced to values near 60% and the overall completeness 
to ~ 70%. 

Likewise, we have removed from the parent LDE sample those galaxies located beyond 
15,000 km s~^, where the ALFALFA survey detection ability drops significantly due to a 
strong radio frequency interference (RFI) signal from the San Juan airport FAA radar. (We 
remind the reader that the original LDE dataset is already confined to objects beyond 2000 
km s~^ in order to eliminate the sources with the most uncertain distances and, in particular, 
a great deal of the kinematic influence of the Virgo cluster.) 



The large size of the LDE sample has allowed us to be also demanding when selecting 
galaxies with favourable inclination angles. Thus, we have taken into account that at low 
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apparent inclinations orientation estimates become more uncertain and are skewed towards 
larger values by nonaxysimmetric features in the images, ultimately leading to deprojected 
properties with divergent er rors as a face-on orienta tion is approached, an issue well known 



in Tully-Fisher studies (e.g. iToribio fc Solanesll2009l ). Furthermore, the extinction and red- 



dening corrections that must be applied to optical observables are stronger and less reliable 
for nearly edge-on objects. Accordingly, we have chosen to restrict the present analysis to 
the 3032 LDE galaxies that, in addition to verifying the above conditions, have also an 
isophotal r-band axial ratio 0.85 > {b/a)r > 0.2 (equivalent to 30° < i < 80° for disks of 
negligible thickness). A detailed analysis of the inclination dependence of the different rela- 
tionships examined in this work corroborates that all the correlation coefficients vary little 
within this range of b/a. Furthermore, we have attempted to reduce as much as possible the 
impact of possible outliers on the inferred relationships, which could be significant especially 
for low Hl-mass objects, by excluding galaxies with highly inaccurate SDSS measurements. 
Although the restrictions on the axial ratio already discard most of the potential optical 
outliers, an additional ~ 7 per cent of the remaining sources were eliminated for this reason. 

Last but not least is the fact that ALFALFA is a noise-limited survey, with a sensitivity 
that depends on the observed source's HI linewidth, while the correlations that we want to 
study ideally ought to be inferred from a volume-limited sample. To sidestep the natural 
bias of blind 21-cm surveys against sources with low fiuxes and large velocity widths, we will 
be dealing with a subset of the LDE sample that includes objects brighter than a stringent 
integrated HI flux > 1.3 Jy km s~^, for which the survey can be considered complete in a 
statistical sense regardless of line width (see Appendix Each galaxy in this restricted 
dataset will be weighted by the inverse of the maximum effective volume, V^^^, in which it 
should, on average, have been detected. To calculate the latter, we have taken into account 
not just the observed integrated flux of the source, its distance, and the adopted sensitivity 
limit, but also the presence of large-scale structure in the surveyed volume and the loss 
of signal that result from man-made RFI, which alter locally the survey completeness and 
therefore the detection probability of the sources. In the remaining of the paper, we will 
refer to this selection as our H I flux-limited LDE 'high-quality' galaxy sample (LDE-HQ for 
short), which totals 1624 gas-rich objects. 



3. PARAMETER SELECTION 

Central to this work is the search of dependencies between the neutral gas mass content 
and other intrinsic properties of H I emitters. We have selected the largest possible number 
of observational parameters that, besides being suitable to characterize gas-rich objects, are 
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inferred from good quality measurements and either publicly available or easy to compute. 



3.1. Available Parameters 



In our quest for the most relevant galaxian properties that define the HI content, we 
first review the extensive set of radio and optical parameters that can be inferred from the 
observables listed by the ALFALFA and SDSS DR7 and NYU-VAGC catalogs, which in some 
cases provide different estimates of the same attribute (e.g., Petrosian and model magnitudes 
to measure brightness, isophotal diameter and Petrosian radius to measure the angular size, 
and so on). After a first screening of all the possible variables available, paying attention 
to factors such as the relative size of the obsevational errors, as well as the suitability and 
robustness of the measurements for the kind of galaxies under scrutiny (i.e., blue, late-type, 
star- forming systems), we select the following properties as the most convenient for our study. 



The two main measures that can be obtained from ALFALFA observations: the 21-cm 
linewidth of the source at the 50% level of the two peaks, W^q, and the H I mass, which 
is estimated from the equation 



Mhi = 2.356 X lO^d^Fm , (1) 

where Fhi is the 21-cm line flux integral expressed in Jy km s~^ and d is the cosmo- 
logical distance to the source in Mpc calculated frorn the multiattractor flow model of 
local peculiar velocities developed by iMastersI (120051 ). Examination of the variation of 
HI surface density with axial ratio for ALFALFA galaxies has lead us to neglect the 
effects of internal HI self-absortion on Fhi. Note that in this paper W50 represents the 
intrinsic width corrected not o nly for inclination, b ut also for the effects of redshift 
broadening and turbulence (see ISpringob et al.ll2005l ). 

Due to the absence of reliable estimates of the intrinsic axial ratio q of the targets, 
we choose to use the r-band isophotal axial ratio {b/a)r from the SDSS as a proxy 
for inclination in the calculation of the intrinsic linewidths. The error on inclination 
corrections arising from taking g = is in general negligible, except for nearly edge-on 
objects, which have been removed from our dataset (see Section [2]). 

The luminosities (and their corresponding absolute magnitudes) in the five SDSS bands 
from Petrosian apparent magnitudes. The latter, which are especially suited for bright, 
extended objects, lea d to recover almost a ll the light from late-type galaxies and around 
80% for early types (IBlanton et al.l 120011 1. Absolute magnitudes have been corrected 
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to face-on values following IShao et al.l ( 120071 ) . They are not corrected for the seeing 
effect. 

The 25 mag arcsec"^ isophotal major-axis diameter, in the five SDSS bands, 

which provide a more continuous measure of the scale of galaxies than the Petrosian 
radii R^q and Rgo. The minimum velocity cutoff of 2000 km s~^ adopted in Paper I 
when defining the LDE sample leaves out of the present analysis more than half of the 
mostly nearby, blue faint objects having unrealistically small isophotal angular radii 
in the SDSS database. 

Isophotal linear diameters have been corrected for inclination by using transformations 
of the form 

logD25 = log/^^5'' + /31og(Va) , (2) 



where D25 and -D25 ^^^5 respectively, the intrinsic and observed values of this variable 
in a given band, whereas the coefficient /3 measures the strength of the corresponding 
att enuation. We note that our corrections are somewhat stronger than those estimated 
by iMaller et al.l (120091 ) from equation [2] for R^q. For instance, we obtain (3r = 0.35 in 
the r-band, while the attenuation for -Rso.r calculated by the latter authors is 0.20. 

The colors from model magnitudes. Here, we explore the combinations (u — g), (g — r), 
(r - i), {i - z), {u - r), {u - i), {u - z), {g - i), {g - z), and (r - z). 

The effect of inclination on the colors of our g alaxies has been corr ected by using 
the variation of color with axial ratio quoted in [Masters et al.l (120101 ) . For galaxies 
with log(a/6) < 0.7, we use the straight-line fits listed in their Equation (3), inferred 
for Galaxy Zoo (GZ) spirals with Rgo > 10" and a constant extension of these fits 
for objects with log(a/6) > 0.7, as measured in o -band. We have not applied the 
strongest color corrections derived by Masters et al.l for systems with smaller apparent 
sizes, because we have verified that the adopted transformation is enough to make the 
average intrinsic {g — r) color of ALFALFA sources independent of viewing angle. 

The Sersic index, n, from the NYU-VAGC catalog, which measures the shape of the 
observed r-band luminosity profile of a galaxy fitted using the Sersic R^^^ formula with 
elliptical isophotes. Available only for galaxies with r < 18 mag. 

The (inverse) index of light concentration, C59 = -R50/-R90, in the five SDSS bands, 
which ranges from to 1 and is available for the full SDSS DR7 dataset. Not corrected 
for seeing. 



Along with these variables, we also include the isophotal {b/a)r, which given its apparent 
nature should be uncorrelated with any of the former quantities and can act therefore as a 
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control parameter. The values of this axial ratio quoted in the SDSS are not corrected for 
seeing, which makes the most inclined galaxies appear rounder than they should be. We 
note, however, that the possible impact of seeing in our data is likely to be insignificant since 
its effects are only noticeable for small, highly-inclined galaxies, which are mostly excluded 
from our sample due the adopted integrated flux cut (the members of the LDE-HQ dataset 
are regular gas-rich galaxies with Mhi > 10^ Mq). Besides, blind HI surveys in which the 
detection probability depends on the observed linewidth like ALFALFA are naturally biased 
against this kind of targets (compare, for insta nce, the data point densities in our Figured] 



and those in Figure 1 from [Masters et al.ll2010l derived for GZ spirals 



Note also that in the present analysis, the color, as well as the Sersic and light con- 
centration indexes, are used as objective proxies of morphology. Attempts to work with 
'classical' indicators of morphological type, such as the continuous de Vaucouleurs numerical 
code listed in the HyperLeda database, have been thwarted by lack of completeness: about 
half of the SDSS galaxies do not have information on their Hubble types, while the same is 
true for ~ 16% of the ALFALFA sources. 

In order to examine the correlation structure, we feed in the logarithms of all these 
basic variables but C59 and {b/a)r- In this manner, we should be able to find any scaling 
law that might exist among them (this is also a must when variables have a lognormal 
distribution). This means, in particular, that it is not necessary to explicitly include in the 



analys is interesting composite parameters, such as the stellar mass given by iGavazzi et al. 



((20081), log(MjM^) = -0.152 0.5 18 ig-i) + log Li (a similar formula based on the {g - r) 



color is provided by lBell et al.ll2003l ). that are linear combinations of (the logarithm of) two 
or more single input variables. This is also the case for the mean surface brightness, isophotal 
or Petrosian, /i (= M + 51ogi?), which can likewise be expressed as a function of two of the 
measurements listed above. 



3.2. Relevant Parameters 

Before we proceed with the present study there is, however, an important consideration 
that must be taken into account. It has to do with the fact that on a t-test of significance 
the large size of our sample results in very low critical values for the Pearson's correlation 
coefficients, rp, i.e., for the elements of the various correlation matrices that are inferred 
in this section. For instance, rp'* ~ 0.06 for a level of significance of 0.01 on a two-tailed 
test with about 1600 degrees-of-freedom. This implies that variables that ideally should be 
essentially independent of each other might end up exhibiting a very weak, but nonetheless 
statistically significant, linear relationship according to this test, even in the presence of 
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attenuation from measurement errors. For this reason, as well as to avoid working with an 
excessive number of parameters, we have decided to consider that two given properties are 
truly connected in practice only if they have a Pearson's r that, aside of being statistically 
significant, is also minimally strong (|rp| > 0.3). 

The correlation matrices inferred from different subsets of the attributes selected in 
Section 13.11 demonstrate that colors and luminosities (absolute magnitudes) are strongly 
correlated among themselves, as was to be expected. This allows us to discard all but one 
of the colors and all but one of the luminosities. Among these photometric variables, the 
ones showing the largest correlation coefficients with the HI mass are the g, r, and i band 
luminosities, as well as all the optical colors that can be obtained from combin ations of them. 



Given that the photometric errors in these three bands are rather similar ( IStrateva et al. 



20011 ). we have taken into account both the dynamic ranges of the different colors and the 
economy in the number of involved bands to finally select the r-band luminosity and the 
(g—r) color as the most adequate representatives of these two fundamental stellar properties. 

The correlation analysis has also evidenced that the available measurements of the 
isophotal diameter and the Sersic index in the five SDSS bands are degenerated. As before, 
the superior quality of the SDSS photometry in the central g, r, and i bands results in some- 
what stronger correlations of these variables with H I mass at such wavelengths. Consistency 
with the adopted bandpass for measuring the amount of light, has lead us to select r-band 
estimates of -D25 and n to represent these quantities too. 

Overall, we find that the isophotal r-band diameter, the r-band luminosity, and the W50 
linewidth are tightly related to the Mhi. A second group of attributes, the {g — r) color 
and the r-band Se rsic index, are moderately aligned with this quantity (|rp| ~ 0.3-0.5; see 



Disney et al.ll2008l for a similar conclusion regarding color). While we have decided to retain 
{g — r) in the set of parameters that may be needed to describe the cold gas content of a 
galaxy, we have chosen to omit from this list the Sersic index, as it has a somewhat smaller 
|rp| and is available only to galaxies in the NYU-VAGC catalog (i.e., with r < 18 mag). 
Finally, the third morphological separator selected, the index of light concentration, shows 
little or no indication of a major dependence on the HI mass, exhibiting a Pearson's r of 
similar strength (< 0.2), for instance, that correlations involving the apparent inclination, 
so it will also be discarded in the remainder of this study. 
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4. RELATIONSHIPS AMONG HI AND OPTICAL PROPERTIES 

We now complete the characterization of gas-rich galaxies by circumscribing ourselves 
to the final set of five non-degenerate, intrinsic properties selected in the previous section. 

To begin with, we will examine in detail the correlation structure of this multivariate 
space to both identify its most statistically significant principal components and their de- 
gree of alignment with the selected observed parameters. We will then use the correlations 
among the measured variables to derive formulae for predicting the neutral hydrogen content 
according to two different approaches. On the one hand, we will seek for the most probable 
value of Mhi assuming the other four properties are precisely known. This involves solving 
a multivariate regression problem in order to obtain equations useful to establish standards 
of normalcy for the H I content of galaxies — we shall define as 'normal' the typical values of 
our LDE-HQ galaxy sample members — from a set of diagnostic parameters easily accessible 
to observation. On the other hand, we will also determine the best-fitting axes of all the 
different pairs that can be built from these five basic attributes and the constraints they 
impose on the scaling laws connecting fundamental properties of galaxies. 



4.1. PC A Results 

Here we deal with the correlation matrix of the variables log(MHi [M©]), log(W5o [km s~^]) 
log(D25,r [kpc]), Mr [mag], and [g — r) [mag] in order to perform a principal component anal- 
ysis (PGA) in this parameter space. Unlike the covariant matrix, the use of the correlation 
matrix entails the standardization of the original variables putting them on an equal footing: 
all samples of variables are get to have zero mean and unit standard deviation. This scaling 
of the measurements avoids the creation of spurious interrelations arising from the prepon- 
derance (i.e., larger dynamic range) of certain properties. For the PCA cal culations, we em- 
ploy t he IDL procedure pca.^)rc@, implemented following the description by Murtagh fc Heck 



( 119871 ). This algorithm has been modified in order to account for the fiux limitation of our 



sample and the measurement error of the estimates. 

As stated in Section [2] (see also Appendix . the integrated fiux cutoff of the LDE-HQ 
sample has been compensated by weighting each of its member galaxies by the inverse of the 
maximum volume, V^^^, in which it could have been observed corrected for the systematic 
effects of large-scale structure and loss of signal due to RFI. This correction is larger for 



^http : / / idlastro . gsf c . nasa . gov/ f tp/pro/math/pca . pro. 
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the lowest Hl-mass objects since they are only detected at short distance!- We have also 
taken into account that the correlations between variables are weakened in the presence of 
measurement error. Given two sets of estimates X and Y with independent measurement 
errors, the disattenuated corr elation between the underlying variables X and Y can be 



obtained from the formula (cf. ISpearmanlll904l ) 



V J^XX^YY 

where the reliability coefficients Rxx and Ryy are defined as one minus the ratio between 
the variance on the corresponding measurement error and the total observed variance. In 
practice, this means that variables are standardized usi ng an estimate of their true standard 



deviation, instead of th e observed one. We have followed iFuller fc Hidirogloul (119781 ) (see also 



Bock fc Petersen! Il975l ) and extended this correction to the multivariate case by imposing 
the constraint of producing a valid correlation matrix for our disattenuated variables, i.e., 
a matrix that is at least positive-semidefinite, after checking that their associated measured 
error scores are essentially independent. 

The results of our PGA analysis are summarized in Tables [1] and [2] in the form of the 
l/V^g^j^- weighted correlation matrix, all its eigenvectors and eigenvalues, i.e., the variances 
of the data in the directions of the principal well as the rms residuals between the 

5-dimensional space (manifold) of the observations and the different p-dimensional subspaces 
{p = 1, 5) that best describe them. Table (H which presents the results for disattenuated 
correlations, also lists the adopted estimate of the typical measurement error and the square 
root of the reliability coefficient for the observables. In general, the selected parameters have 
a near perfect reliability, except the color, for which the attenuation correction is significant. 

The correlation matrices indicate the existence of large linear correlations (|rp| > 0.60) 
for all pairs of variables except for the H I mass and color, that exhibit correlation coefficients 
of medium size (|rp| ~ 0.45-0.60; note, in contrast, the substantially higher coefficients of the 
correlations between luminosity and color). Table [1] shows that the five galactic attributes 
selected are all well correlated with the first principal component. The latter is endowed 
with direction cosines of nearly equal absolute value (~ l/-\/5 ~ 0.45) and has an associated 
eigenvalue Ai of ~ 4.2, out of a maximum possible of 5.0, implying that about 83% of 
the total variance in the adopted 5-parameter space can be explained by a single principal 
axis. The second, and already statistically insignificant, principal component, draws mostly 
from Mhi, and accounts for an additional ~ 10% of the global variance, while a linear 
subspace of three dimensions is enough to explain the practical totality (~ 99%) of it (these 



•^Thc LDE-HQ dataset represents a volume-limited sample of more than 29, 000 galaxies. 
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numbers are reduced somewhat when we do not use disattenuated correlation estimates; see 
Table [2]). The principal 4-plane brings the rms residuals down to the level of the adopted 
observational errors with the exception of M^, whose variance cannot be fully accounted for. 
This might indicate either that there are still hidden parameters controlling the structure 
of disk galaxies, such as perhaps the mass-n ormalized star forni ation rate, which is tightly 
related to the light concentration index (e.g. iNikolic et al.ll2004l ). or that the observational 
error adopted for this variable has been too optimistic. Note also that the correlation matrix 
reveals a very strong negative relationship between the r-band magnitude and isophotal 
diameter, which is responsible for the nearly null variance attached to the last eigenvector. 

Our finding that one single principal component — which does not appear to be domi- 
nated by any of the five major properties investigated — explains a great deal of the variance 
of the observed manifold suggests that the structure of regular gaseous galaxies should be 
determined by very few independent features. This is consistent with the results of earlier 
PCA-based studie s that have attempted to elucidate the degree of org anization shown by 
disk galaxies (e.g. lBroschelll973l : iBujarrabal et al.lll98ll : IConselicd l2006l ). as well as in very 
good agreement with the conclusi on that HI ga l axies lie essentially on a single fundamental 
line reached in a recent study by iDisney et al.l (120081 ) that, like the present one, combines 
homogeneous 21-cm data (from the HI Parkes All Sky Survey; HIPASS) with SDSS optical 
measures. 

Regarding the possibility, suggested by these latter authors, that Hl-selected galaxies 
have colors made up of two components, one systematic, correlated with the other variables 
and the single principal component, and a so-called rogue component, which only alignes 
with itself and therefore could act as a second significant parameter, we offer a different 
explanation. Our results — based on a dataset about 8 times larger, though spanning a 
smaller dynamic range — , show that when the PGA is carried on the attenuated (and l/V^^ax" 
weighted) correlation matrix, we obtain a second principal component, PC2, well aligned with 
color. Nevertheless, once the measurements are corrected from observational error the color 
contribution for PC2 significantly weakens and is no longer the dominant one (compare the 
corresponding eigenvectors in Tables |2] and [H respectively). This leads us to conclude that 
the possible statistically significant second degree of freedom found by iDisney et al.l (|2008[ ) 
is actually an artifact produced by the substantial attenuation exerted by the measurement 
error of the latter observable on the moderately strong intrinsic HI mass-color correlation 
(see also Paper I). 
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4.2. Standards of H I Content 

We have just shown that the five observables we are deahng with are actually intercon- 
nected. In this situation, any multiple regression model which attempts to describe the HI 
mass in terms of interrelated predictor variables (multicoUinearity) would be associated with 
an ill-conditioned correlation matrix, i.e., a matrix whose inversion is numerically unstable 
(if there were one or more exact linear relationships among the variables the matrix would 
not be invertible). Thus, in the presence of multicoUinearity the impact of the individual 
predictors on the response variable tends to be less precise than if the predictors were uncor- 
related with one another. To remedy this problem, we have adopte d a two-stage procedure 



known as Principal Component Regression (PCR; e.g. ICookl 120071 ) that first carries out a 
PCA of all the predictor variables, and then uses the resulting principal components — which 
are independent, and hence associated with a correlation matrix of full rank — together with 
the dependent variable in an ordinary least squares regression fit. Besides, one can take 
advantage of the initial PCA transformation to reduce the dimensional ity of the da ta by 



keeping only those new variables most correlated with the HI mass (e.g. |Jolliffelll982[ ) 



We have applied the above procedure to subspaces of increasing dimension, starting by 
finding the regression relations between Mhi and each one of the four remaining properties 
(see the plots above the diagonal of Figure and then adding input variables progressively 
to seek for the combinations of regressors that best predict the HI content. This process is 
stopped when after adding a new predictor variable the rms residual of the multiple regression 
model increases or does not get reduced in an amount comparable or larger than the typical 
observational error in Mhi quoted in Tabled] 

In agreement with the results of the previous section, we find that the best predictions 
for the HI mass are those depending on a single regressor variable. Table [3] lists, ordered 
according to decreasing accuracy as given by the size of the rms residuals, the central values 
and associated errors of the coefficients a, of the correlations 

log =ao + a,X (4) 

for fits with and without 1 /l^j^^^^^- weighting carried on disattenuated data. It can be seen that 
the most precise predictor of the H I mass is -D25,r, just the property most strongly correlated 
with it, followed by Mr. Among the distance- independent observables, the best predictor 
of the HI content is the rotational width of the disk, whereas the regression model using 
the {g — r) color is the least accurate, as expected. Of course, the use of the H I rotation 
speed to estimate Mhi only makes sense when the 21-cm line fiux integral is not available 
and this predic t or ca n be replaced by proxies like, for instance, the estimators defined in 



Catinella et al.l (120071 ) from optical rotation curves. 
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The uncertainties of the correlation coefficients quoted in Table [3] include two contribu- 
tions. The ffist one is the random error, calculated by adding in quadrature the statistical 
error of the parameter estimates, based on 5000 non-parametric bootstrap trials of the cor- 
relations, and the spread due to observational errors, which we have calculated through the 
creation of 10,000 realizations of the correlations after assigning Gaussian random measure- 
ment and distance errors to each galaxy. 

The second error term is included to show the impact that voids (and overdensities 
of comparable size present in the LDE-HQ sample) can have on the inferred relationships 
by systematically undercounting (overcounting) galaxies in those regions. Remember that 
we are dealing with galaxies in low-density environments and that the weighting scheme 
depicted in Appendix |X] only corrects for the redshift-averaged density of galaxies in the 
surveyed volume. This systematic error has been evaluated by dividing the sample into 8 
equal-area sky regions of ~ 15° and calculating the correlations 8 times, leaving out a different 
section each time. At the median survey redshift of ~ 8000 km s~^, the adopted angular size 
correspo nds to about 30 Mpc, a scale comparable to the typical diameter of voids in the local 



universe (IHoyle &: Vogeleyil2004l ). The contribution of galaxy under and overdensities to the 



flLuptonlll993f ) 



variance in an y correlation coefficient is measured through the jackknife error estimator 

1 ^ _ 

-"^((^ij -^f , (5) 



where N = 8. 

Finally, we have also verified that combinations of two distance-dependent input vari- 
ables, which obey equations of the form 

(^^) = "0 + fli ^1 + 0.2 X2 , (6) 

do not contribute to reduce the spread that may arise from distance uncertainties. We 
report in Table [3] the coefficients obtained using the optical size and luminosity as diagnostic 
variables, which is also the only multilinear regression model with a rms residual as good 
as that of the best linear model (but at the cost of using two predictors). Looking at the 
coefficients of this multiple regression for weighted and unweighted data, it is clear that the 
right-hand side of the equations shows a dependence with distance that neither is null (the 
ratio a\ja2 7^ —5), as would be expected if they were defining a surface magnitude, nor 
does it compensate the rf^-dependence of the HI mass. As done previously for the single 
regression models, we account for this breach of distance independence when calculating the 
random error of the correlation coefficients. 
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4.3. Planar Correlation Diagrams 



In the previous section, we have been interested in obtaining predictions for one integral 
property, the HI mass, from the observed values of other four ones: size, total magnitude, 
velocity width, and color. In this section, we turn our attention to the constraints that 
these intrinsic quantities put on the scaling relations among the fundamental attributes of 
individual galaxies. This means that we now consider the five compiled variables on an equal 
footing and focus on the correlations arising from the same PCA technique used in Section l^?T] 
applied to pairs of them. The involved quantities are therefore treated symmetrically, thus 
minimizing the inconsistencies that may arise from the possible 'non-commutativity' of the 
inferred relationships. The coefficients of the l/V^^^^- weighted and unweighted orthogonal 
fits of the form Y = ao + ai X between all 10 possible pairs of galaxy properties are presented 
in Table H] for disattenuated data. (For this exercise, the quoted uncertainties depict only 
the random error estimates calculated as in Section H72l ) The scatter plots and their best 
linear fits can be visualized in the boxes below the diagonal in Figure |5J 

The study of the scalings of the most basic properties of galaxies is central for constrain- 
ing theories of their formation and evolution. It has generated an abundant literature, whose 
detailed revision far exceeds the scope of the present work. Instead, we have decided to focus 
on the comparison between the values for the mean slopes of the strongest correlations we 
predict, which are those in the L{M)RV subspace of luminosity (mass), size, and rotation 
speed, and those reported in othe r studies that also combine HI and optical observations 
( 1HG84| : ISalpeter fc Hoffmanlll996l ). or that specifically study the scalings among the above 
fundamental properties in late- type objects fICourteau et al.ll2007l ). When comparing results 
allowance should be made not just for differences in sample size, but also for other factors 
such as the waveband of the optical observations, the specific observables chosen to estimate 
the above attributes, their dynamic ranges, or the fitting method employed. Another com- 
plication that distorts the comparison among the different outcomes is the incompleteness of 
the datasets that, with the exception of ours, are all affected by intractable selection biases. 

With all these caveats in mind, the agreement between the central values of the slopes 
(rounded to the two most significant digits) reported by the different studies listed above 
(see Table |5]) can be classified as generally satisfactory. The largest discrepancies correspond 
to the correlation involving the luminosity versus the rotation speed, i.e., the TuUy-Fisher 
(TF) relation, refiecting the fact that it is always problematic to accurately find the slope of 
this empirical law due to the low dynamic range in log V. T he mea n values o f the log slope 
listed in Table [5]r ange from ~ 2.6 (—6.5 in rn agnitude units; IHG84| ) to ~ 3.4 (ICourteau et al. 



20071 ) and ~ 3.7 (ISalpeter fc Hoffmanlll996l) (the first and the last ones being measured at 
blue wavelengths and the lCourteau et aLr s for the /-band). From our weighted data, we get 
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a slope of 3.3 ±0.2 (random error) or, equivalently, —8.2 ±0.5 mag, which roughly falls in the 
middle of this range and is fully consistent with e stimates reported in TF-specific literature 
for bright spirals in the blue/near- IR band (e.g. 
Courteaul[l997l : [Masters et al]l2006h . 



WiUick et al. 1997; Giovanelli et al. 1997 



Among all the results obtained, the most striking is perhaps the relationship between 
the H I mass and the characteristic size of the stellar distribution of gaseous galaxies, repre- 
sented here by the isophotal diameter in the r-band, I^25,r- Our finding that it has a central 
slope a = 1.55 ± 0.06 does not support the idea that all H I-rich galaxies have roughly the 
same global H I column density, as recently advocated by iGarcia-Appadoo et all (l2009l . see 
also references therei n) from a sample of HIPAS S galaxies and implied by correlations such 
as the one found by ISalpeter fc HoffmanI (119961 ) . listed in Table 0, from optical size mea- 
surements in the i?-band. (Implicit in this conclu sion is the assumption that HI and stellar 
disk sizes are roughly proportional as shown by iBroeils &: Rhed (119971 ).) We remind the 
reader once again that none of these previous studies based their conclusions on correlation 
analyses performed on complete multivariate datasets, as we have done here. In particular, 
we suspect that the use of samples largely dominated by late-type spirals, which are more 
prone to exhibit a nearly constant mean HI surface density, may c ontribute to exacer bate 
the tendency to find such an aesthetically appealing result (see also lSolanes et al.lll996l ). In 
thi s regard, we wish to ernphasize that the claimed constancy of the mean H I column density 
by iGarcia-Appadoo et al.l (120091 ) . who deal with a sample that, morphologically speaking, 
is representative of the entire population of H I emitters, emanates from a (unweighted) H I 
mass-i?5o,g correlation with an actual central slope of ~ 1.7, in pretty good agreement with 
our outcome. 



5. SUMMARY AND CONCLUDING REMARKS 

We have sought for correlations among a large set of extensive 21-cm and optical homo- 
geneous measures available for the 1624 members of a complete, HI flux-limited sample of 
non-clustered, gas-rich galaxies, not influenced by their environment. Our main aim has been 
identifying the combinations of intrinsic variables directly arising from observable quantities 
that make up the best diagnostic tools for the HI content. The sources used for this research 
have been selected from the Low Density Environment H I galaxy sample of the ALFALFA 
blind HI survey defined in Paper I. The size of this primary database has been reduced to 
include only high-quality ALFALFA detections with a S/N > 6.5 found up to 15, 000 km s~^ 
and with moderate inclinations 0.85 > b/a > 0.2 (30° < i < 80° for q = 0). Furthermore, 
we have selected only those HI emitters with an integrated flux > 1.3 Jy km s~^ from which 
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the sensitivity of the survey becomes essentially independent of profile width. 

The examination of the correlation structure of these data, conveniently weighted to 
compensate for the flux limitation, as well as for the systematic effects of large-scale structure 
and loss of signal due to man-made RFI in the surveyed volume, has produced the following 
interesting results. 



In (bright) gas-rich galaxies the isophotal r-band linear diameter, total r-band luminos- 
ity, 14^50 linewidth, and (g — r) color are the galaxian properties most tightly correlated 
with the total H I mass. The principal component analysis of the manifold defined by 
these variables has revealed relationships with large correlation coefficients that are 
suggestive of a high degree of organization in the LDE-HQ sample. This is consis- 
tent with the idea t hat HI emitters beh ave essent ially as a one-parame ter family. As 
previously noted by iDisney et al.l (120081 ) (see also Ivan den Berghll2008l and references 
therein), the observed structural simplicity of disk galaxies is difficult to reconcile with 
the prevailing theory of hierarchical galaxy formation, which holds that the physical 
properties of these objects are determined by the interplay of several potentially inde- 
pendent factors, such as mass, spin and (the chaotic at high redshift) merger history 
of galactic halos. 

In accordance with the output of the PCA, we have found that the best predictions 
for the most probable value of Mhi, assuming that the regressor variables are precisely 
known, are those depending on a single parameter. Our fits carried on l/V^^g^^- weighted 
and disattenuated data show that the most accurate predictor for the HI mass of a 
galaxy is its optical diameter, through the equation 



log 



Mhi 



8.72 ± 0.06 ± 0.06 + 1.25 ± 0.06 ± 0.071og 



D 



25,r 



(7) 



Mq J \ kpc 

where the first error term is statistical and the second the systematic uncertainty 
arising from the large scale structure present in the surveyed volume. In Table [3l 
we provide alternative prescriptions to calculate Mhi from Mr, W50, and {g — r), as 
well as from a combination of -D25,r and Mr. The fact that the models based on the 
crude morphological indicator {g — r) yield rms residuals comparable to the global 
standard deviation associated with Mhi hints that the morphology of HI emitters 
plays a sec ondary role iri determining their neutral gas content, as already inferred by 
HG84l and [Solanes et all fll996h . 



From the joint distributions of the quantities most strongly correlated with the HI 
mass, we derive the mean relationships 

Mhi oc R^-\ Mhi oc L^-^\ L oc \ R oc L°-^°, R oc V^-^ , (8) 
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where L represents the total r-band luminosity of the (old) stellar disk of a galaxy, and 
R and V its size and rotation speed, respectively. Among the scaling relations that 
involve HI measurements, the most interesting ones are the well-known LV or TF 
relation and the ratio between the total neutral gas mass and optical radius. For the 
former, it is noteworthy that we find a central slope fully consistent with the typical 
values reported in TF studies at optical/near-IR wavelengths from data that have not 
been specifically selected for this task. On the other hand, the slope inferred for the 
second scaling implies that the hybrid surface density of neutral hydrogen (oc Mhi/-R^) 
is not constant, but moderately decreasing with galaxy size. Claims in favor of the 
near universality of the global HI surface density for the entire spiral population rely 
on incomplete datasets biased towards galaxies of the latest Hubble subtypes (mostly 
Sc and Irr), for which the constancy of this intensive property is a relatively acceptable 
approximation. 

To date, most multidimensional statistical studies focusing on the interrelations among 
the main properties of galaxies have had to contend with largely incomplete, heterogeneous 
samples of modest size and affected by important selection artifacts. It is therefore evident 
that disregarding any of these factors when they are in fact present may result in inconsistent 
estimation. In this respect, efforts like the present one based on the cross- correlation of large 
datasets assembled from objective, wide-area surveys with controlled sampling biases should 
mark the way forward. 
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A. ASSESSING THE INTEGRATED HI FLUX COMPLETENESS OF THE 

ALFALFA DATA 

Although the ALFALFA catalog and its LDE subset are noise-limited datasets, it is 
possible to define an integrated flux limit, Fui^^, above which the conflrmed HI sources are 
not subject on average to the same bias against broad linewidths. 



With this aim, we have use d the adapta t ion of the iRauzyl (120011 ) completenes s test t o 



a Hl-selected galaxy sample by IZwaan et al.l (j2004j ). which we briefly recap here. iRauzyl 's 



method, which is not affected by the presence of clustering or by subsampling in redshift 
bins, relies on the calculation for each galaxy i of the quantity 

6 = , (Al) 

which provides an unbiased estimate of the random variable Q that compares the amount 
of galaxies with more and less neutral hydrogen than every galaxy in the sample under the 
assumption that the shape of the HI mass function is universal (i.e., invariant in time and 
position). In equation lAH is the number of galaxies with Mhi > Mhi,! and Z < Zi, rii is 
the number of galaxies for which Mhi > MHi,iim(-^j) and Z < Zi, while Z = log^Hi — logMni 
is a distance measure, and MHi,iim(-^j) is the limiting HI mass at the distance corresponding 
to Zi. 

Taking into account that Q is uniformly distributed between and 1 with expectation 
and variance Ei = 1/2 and Vi = {rii — l)/[12(ni + 1)], respectively, the principle of the test 
is to evaluate the variation with decreasing Fnijim of the statistic 



1/2 



which follows a Gaussian distribution of zero mean and unit variance under the null hypoth- 
esis, HO, that the sample is complete up to a given integrated flux and drops systematically 
to values below zero when HO is not fulflUed. The completeness limit can be therefore set 
by imposing that Tq exceeds a given negative value, being Tc < —2 and Tq < —3, which 
correspond to a 97.7 and 99.4 confidence levels of rejection of HO, respectively, the standard 
decision rules. 
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Figure |3] shows the results obtained when the test is apphed to subsamples of both the 
ALFALFA data without any density restriction (top panel) and the LDE dataset (bottom 
panel) truncated to decreasing values of Fn^ n^. Note that for both datasets we are consider- 
ing only code 1 sources within the bandpass limits 2000 and 15,000 km s~^ (see Section [2]), a 
subsampling in redshift that should not affect the outcome. It is clear from this figure that 
if we choose a —3a criterion to reject the completeness hypothesis, which corresponds to the 
level from which the Tq statistic initiates a systematic, sharp decline, the completeness limit 
of the two samples can be safely set at L3 Jy km s~^. 

A more direct, but less accurate, method to infer whether a survey is complete or not 
to a given flux limit is to compute the average V/Vma,^ of the data, which should be equal 
to 1/2 provided that the effective search volume of each galaxy, or equivalently, its detection 
probability, is accurately established. The (V/Knax) test presupposes that the galaxies are 
on average homogeneously distributed and is therefore sensitive to selection and large-scale 
structure effects. Therefore, the correct application of this second technique to the above 
HI datasets requires that in the calculation of the individual search volumes we account 
for the true density of targets as a function of redshift, as well as for the artificial annular 
underdensities that arise from man-made radio-frequency interferences. 

Accordingly, we have modified V and Knax in order to include weighting by the aver- 
age density interior to the corresponding radial distances normalized to the average density 
within the surveyed volume. The weights have been calculated from a volume-limited sub- 
sample of the spectroscopic SDSS DR7 dataset, which is co mplete to a Petros ian r-band 



magnitude of 17.77, selected to include only galaxies obeying iMaller et all (|2009|)'s disk cri- 
terion, which is very effective identifying objects structurally similar to HI emitters (see 
Paper I). In addition, we have corrected for RFI by using the same a verage relative weigh t 



as a function of observed heliocentric velocity depicted in Figure 6 by iMartin et al.l (120101 ). 



In Figure HI we depict the expectation value and dispersion of the average ratio of 
weighted volu mes, {V ' lVLr,J) , as a function of the integrated HI flux. As we have done 



previously for iRauzyl 's test, we show results for the ALFALFA data (top panel) and its LDE 



subset (botto m panel; in this case, the radial run of the density weighting is estimated using 



Mailer et aLr s disks in low density environments). Galaxies not belonging to the redshift 
range 2000-15, 000 km s~^ or classifled as code 2 detections have been discarded. In very 
good agreement with the Tc method, we flnd that above -Fni.iim ~ 1-3 Jy km s~^ the values 
of this statistic remain, for the two datasets, practically constant around 0.50. Note that 
the observed rough invariance of (^'/Knax) above this integrated-flux limit supports the 
assertion that the two samples are statistically complete, while the fact that the observed 
value of the statistic is so close to its expectation indicates that the effects of large-scale 



structure and RFI have been correctly averaged out with the adopted weighting strategy 
and, therefore, that we are deahng with homogeneous datasets. 
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Table 1. Principal Component Analysis of HI and Optical Properties for Weighted Data 

and Disattenuated Correlations 





log A/h I 


log 14^50 


log r>25,r 


Mr 


ia - r) 




{b/a)r 


logMni 


1.00 


0.65 


0.80 


-0.74 


0.58 


-0.18 


0.11 


log W50 




1.00 


0.71 


-0.76 


0.88 


-0.22 


0.08 


log D25,r 






1.00 


-0.96 


0.85 


-0.09 


0.02 


Mr 








1.00 


-0.87 


0.25 


-0.00 


(9 - r) 










1.00 


-0.19 


0.20 














1.00 


0.10 


(b/ajr 














1.00 


Mean (TVgai = 1624) 


8.82 


2.22 


0.42 


-17.43 


0.33 


0.44 


0.50 


Standard dev. 


0.36 


0.18 


0.23 


1.48 


0.12 


0.05 


0.16 


Observational error 


0.02 


0.03 


0.05 


0.10 


0.10 


0.01 


0.07 


VRxx 


1.00 


0.99 


0.98 


1.00 


0.79 


0.98 


0.92 




Eigenvectors 








Eigenvalues (%) 


ei 


0.40 


0.43 


0.47 


-0.47 


0.46 


Ai = 4.17 


(83.31) 


62 


0.73 


-0.45 


0.21 


-0.03 


-0.46 


A2 = 0.49 


(93.02) 


ea 


0.46 


0.63 


-0.43 


0.43 


-0.12 


As = 0.30 


(99.07) 


64 


-0.08 


0.21 


-0.41 


-0.75 


-0.47 


A4 = 0.05 


(100) 


es 


-0.29 


0.40 


0.62 


0.17 


-0.58 


A5 = 0.00 


(100) 


rms residuals: 
















Principal Axis 


0.22 


0.09 


0.09 


0.51 


0.09 






Principal Plane 


0.11 


0.09 


0.07 


0.50 


0.06 






Princ. 3-Plane 


0.06 


0.05 


0.05 


0.42 


0.06 






Princ. 4-Plane 


0.05 


0.04 


0.07 


0.12 


0.04 






Princ. 5-Plane 


0.00 


0.00 


0.00 


0.00 


0.00 







Note. — PCA is carried on the first 5 properties only: log(A^H i [^^^gDj log(VF5o [km s ^]), 
log(-D25,r [kpc]), Mr [mag], and (g - r) [mag]. 
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Table 2. Principal Component Analysis of HI and Optical Properties for Weighted Data 





log A/h I 


log 14^50 


log r>25,r 


Mr 


(9 - r) 


C59,r 


{b/a)r 


logMni 


1.00 


0.64 


0.78 


-0.73 


0.46 


-0.17 


0.10 


log W50 




1.00 


0.69 


-0.75 


0.69 


-0.21 


0.07 


log D25,r 






1.00 


-0.94 


0.66 


-0.09 


0.02 


Mr 








1.00 


-0.69 


0.24 


-0.00 


(a - r) 










1.00 


-0.15 


0.14 














1.00 


0.09 


{b/a)r 














1.00 


Mean (Afg^i = 1624) 


8.82 


2.22 


0.42 


-17.43 


0.33 


0.44 


0.50 


Standard dev. 


0.36 


0.18 


0.23 


1.48 


0.12 


0.05 


0.16 




Eigenvectors 








Eigenvalues (%) 


ei 


0.42 


0.44 


0.48 


-0.48 


0.41 


Ai = 3.83 


(76.57) 


62 


0.61 


-0.25 


0.21 


-0.09 


-0.72 


A2 = 0.57 


(88.06) 


es 


-0.25 


-0.78 


0.43 


-0.31 


0.23 


A3 = 0.33 


(94.60) 


64 


-0.61 


0.34 


0.20 


-0.45 


-0.52 


A4 = 0.22 


(98.98) 


es 


-0.14 


0.15 


0.71 


0.67 


-0.05 


A5 = 0.05 


(100) 


rms residuals: 
















Principal Axis 


0.20 


0.09 


0.08 


0.48 


0.10 






Principal Plane 


0.12 


0.09 


0.07 


0.47 


0.04 






Princ. 3-Plane 


0.10 


0.03 


0.04 


0.39 


0.04 






Princ. 4-Plane 


0.01 


0.01 


0.04 


0.23 


0.00 






Princ. 5-Plane 


0.00 


0.00 


0.00 


0.00 


0.00 







Note. — PCA is carried on the first 5 properties only; log(Miii[MQ]), log(iy5o [km s ^]), 
log(.D25,r [kpc]), Mr [mag], and (g - r) [mag]. 



Table 3. Coefficients of Mhi Predictions from Single and Multiple Linear Regression Models 



Weighting 




^2 




ao 






ai 






02 




Residual 








S 79 


-i-n c\P. 
itu.uo 


Itu.uo 


i.ZO 


-i-n c\P. 
Itu.uo 


-I-n n7 

itU.U ( 








n o'? 




i\/r 




P. A A 
0.44 


1 n on 




— U. io 


itU.Ui 


1 n ni 

itU.Ui 








n oc; 
U.zo 


\IV' 

1 ' max 


loff W^n 




6.54 


±0.27 


±0.20 


1.30 


±0.11 


±0.09 








0.28 




(g-r) 




8.84 


±0.11 


±0.12 


1.81 


±0.29 


±0.40 








0.33 




log D25,r 




7.26 


±0.12 


±0.04 


0.66 


±0.03 


±0.01 


-0.10 


±0.006 


±0.002 


0.22 




log L'aS.r 




8.85 


±0.04 


±0.03 


1.37 


±0.04 


±0.03 








0.21 




Mr 




6.44 


±0.09 


±0.08 


-0.20 


±0.004 


±0.002 








0.23 


None 


log 1^50 




7.17 


±0.14 


±0.16 


1.21 


±0.05 


±0.06 








0.28 




- r) 




9.61 


±0.04 


±0.04 


1.10 


±0.08 


±0.07 








0.32 




log L)25,r 


Mr 


6.89 


±0.05 


±0.02 


0.61 


±0.01 


±0.005 


-0.10 


±0.002 


±0.001 


0.23 



Table 4. Coefficients of Orthogonal Fits between Pairs of Variables 



Weighting 


Y 


X 


log D25,r 


Mr 


log W50 


{9 - r) 


ao ai 


ao ai 


ao ai 


ao ai 


i/y 

1 ^ max 


log A/h I 

log r>25,r 
Mr 

log VK50 


8.55±0.05 1.55±0.06 


5.36±0.19 -0.24±0.010 
-2.05±0.09 -0.16±0.004 


5.01±0.30 1.99±0.13 
-2.28±0.21 1.29±0.14 
1.46±1.14 -8.15±0.48 


8.45±0.12 2.99±0.29 
-0.06±0.06 1.93±0.15 
-12.8±0.40 -12.2±0.98 

1.73±0.06 1.50±0.13 


None 


logMni 

log -D25,r 

log VK50 


8.58±0.03 1.66±0.03 


5.24±0.08 -0.26±0.004 
-2.02±0.03 -0.16±0.002 


5.30±0.11 1.98±0.04 
-1.98±0.08 1.19±0.03 
-0.24±0.42 -7.57±0.16 


9.06±0.04 2.28±0.07 
0.29±0.02 1.38±0.03 
-14.6±0.13 -8.74±0.26 
1.90±0.01 1.15±0.04 
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Table 5. Central Slopes of Scaling Laws between Fundamental Galaxian Properties 

Reported by Different Authors 



Reference 






Scaling 


I law 






Mhi ~ i?" 


Mh: ~ 


LP L 


~ 






HG84 ri984) 


1.8 


0.66 




2.6 






SalDeter & Hoffman flQQG) 


2.0 


0.74 




3.7 


0.37 


1.4 


Courteau et al. f2007) 








3.4 


0.32 


1.1 


This work 


1.6 


0.60 




3.3 


0.40 


1.3 
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Fig. 1. — The r-band axial ratio versus the radius enclosing 90 per cent of the r-band 
light for galaxies in the ALFALFA LDE sample. Vertical lines correspond to the limits in 
axial ratio adopted in the definition of the LDE-HQ sample (in red, objects with Fhi < 1.3 
Jy km s^^, which we also exclude from the correlation analysis). The horizontal line, plotted 
at -Rgcr = 10", marks the limiting radius above which the seeing effects disappear for optical 
data, while the curved line shows a model in which the minimum possible s emiminor axis 



is b = 1.3", an estimate of the 25th percentile best seeing in the SDSS (see Masters et al. 



2010|). 
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9 10 11 0.5 1.0 1.5 -21 -18 -15 2.0 2.5 3.0 0.0 0.5 1.0 
og Mhi [Mq] log D25,r [kpc] Mr [mag] log W50,; [km/s] (g-r) [mag] 



Fig. 2. — Empirical relations for pairs of properties from LDE-HQ data. l/Vf^g^^,- weighted 
(solid) and unweighted (dotted) direct regression fits to the joint distributions are shown in 
red color above the diagonal of the plot, whereas orthogonal fits are shown below it. All 
correlations are corrected for attenuation (Equation [3]). 
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Fig. 3. — iRauzyl 's test for completeness applied to data from the ALFALFA survey (top) 
and from its LDE subset (bottom). We only consider code 1 sources with redshifts between 
2000 and 15,000 km s~^. In both cases, the HI flux completeness limit, -Fki.iim, (determined 
by Tq = —3) falls somewhat below 1.3 Jy km s~^ (vertical dotted line). 
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Fig. 4. — Same as in Figure [3l but for the (^'/Kiiax) completeness test. 



